53 research outputs found
Enhancing Biomedical Text Summarization and Question-Answering: On the Utility of Domain-Specific Pre-Training
Biomedical summarization requires large datasets to train for text
generation. We show that while transfer learning offers a viable option for
addressing this challenge, an in-domain pre-training does not always offer
advantages in a BioASQ summarization task. We identify a suitable model
architecture and use it to show a benefit of a general-domain pre-training
followed by a task-specific fine-tuning in the context of a BioASQ
summarization task, leading to a novel three-step fine-tuning approach that
works with only a thousand in-domain examples. Our results indicate that a
Large Language Model without domain-specific pre-training can have a
significant edge in some domain-specific biomedical text generation tasks
Will This Video Go Viral? Explaining and Predicting the Popularity of Youtube Videos
What makes content go viral? Which videos become popular and why others
don't? Such questions have elicited significant attention from both researchers
and industry, particularly in the context of online media. A range of models
have been recently proposed to explain and predict popularity; however, there
is a short supply of practical tools, accessible for regular users, that
leverage these theoretical results. HIPie -- an interactive visualization
system -- is created to fill this gap, by enabling users to reason about the
virality and the popularity of online videos. It retrieves the metadata and the
past popularity series of Youtube videos, it employs Hawkes Intensity Process,
a state-of-the-art online popularity model for explaining and predicting video
popularity, and it presents videos comparatively in a series of interactive
plots. This system will help both content consumers and content producers in a
range of data-driven inquiries, such as to comparatively analyze videos and
channels, to explain and predict future popularity, to identify viral videos,
and to estimate response to online promotion.Comment: 4 page
Efficient Non-parametric Bayesian Hawkes Processes
In this paper, we develop an efficient nonparametric Bayesian estimation of
the kernel function of Hawkes processes. The non-parametric Bayesian approach
is important because it provides flexible Hawkes kernels and quantifies their
uncertainty. Our method is based on the cluster representation of Hawkes
processes. Utilizing the stationarity of the Hawkes process, we efficiently
sample random branching structures and thus, we split the Hawkes process into
clusters of Poisson processes. We derive two algorithms -- a block Gibbs
sampler and a maximum a posteriori estimator based on expectation maximization
-- and we show that our methods have a linear time complexity, both
theoretically and empirically. On synthetic data, we show our methods to be
able to infer flexible Hawkes triggering kernels. On two large-scale Twitter
diffusion datasets, we show that our methods outperform the current
state-of-the-art in goodness-of-fit and that the time complexity is linear in
the size of the dataset. We also observe that on diffusions related to online
videos, the learned kernels reflect the perceived longevity for different
content types such as music or pets videos
Hawkes-modeled telecommunication patterns reveal relationship dynamics and personality traits
It is not news that our mobile phones contain a wealth of private information
about us, and that is why we try to keep them secure. But even the traces of
how we communicate can also tell quite a bit about us. In this work, we start
from the calling and texting history of 200 students enrolled in the Netsense
study, and we link it to the type of relationships that students have with
their peers, and even with their personality profiles. First, we show that a
Hawkes point process with a power-law decaying kernel can accurately model the
calling activity between peers. Second, we show that the fitted parameters of
the Hawkes model are predictive of the type of relationship and that the
generalization error of the Hawkes process can be leveraged to detect changes
in the relation types as they are happening. Last, we build descriptors for the
students in the study by jointly modeling the communication series initiated by
them. We find that Hawkes-modeled telecommunication patterns can predict the
students' Big5 psychometric traits almost as accurate as the user-filled
surveys pertaining to hobbies, activities, well-being, grades obtained, health
condition and the number of books they read. These results are significant, as
they indicate that information that usually resides outside the control of
individuals (such as call and text logs) reveal information about the
relationship they have, and even their personality traits
- …